Analysis of the Frequency and Impact of Bird-Aircraft Collisions in The United States¶
Project by Paola Vizcarra

- Setup
- Import Libraries
- Dataset Overview
- Name and Source
- Purpose
- Structure and Composition
- Size:
- Features
- Key Insights from Summary Statistics
- Numerical Variables
- Categorical Variables
- Data Quality Assessment
- Missing Data
- Balanced Data
- Duplicates
- Outliers
- Patterns and Relationships
- FlightPhase, Altitude - Features
- Pilots Warned - Feature
- Engine - Feature
- ConditionsPrecipitation - Feature
- MakeModel - Feature
- NumberStruckActual - Feature
- FlightDate - Feature
- OriginState - Feature
- Cost - Feature¶
- PeopleInjured - Feature
- Limitations and Challenges
- Data Gaps
- Constraints
- Applications and Relevance
- Research Questions
- Potential Use Cases
Import Libraries¶
Name and Source¶
This report uses the “Bird Strikes in Aviation: Aircraft Collisions” dataset from Kaggle.com at https://www.kaggle.com/datasets/iamtapendu/bird-strike-by-aircafts-data
Purpose¶
In aviation, one notable risk is bird strikes—collisions between aircraft and birds or other wildlife. These incidents have become more worrisome due to urban expansion and rising air traffic. Bird strikes can cause substantial damage to aircraft, particularly jet engines, and have been associated with fatal accidents. This analysis seeks to deepen our understanding of transportation safety by examining these risks.
Size:¶
Shape of data is : (25429, 26)
The shape of the dataset is (25429, 26) meaning it contains 25429 rows and 26 columns. Each row has information about a bird-aircraft collision, totaling 25429 incidents.
Features¶
<class 'pandas.core.frame.DataFrame'> RangeIndex: 25429 entries, 0 to 25428 Data columns (total 26 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 RecordID 25429 non-null int64 1 AircraftType 25429 non-null object 2 AirportName 25429 non-null object 3 AltitudeBin 25429 non-null object 4 MakeModel 25429 non-null object 5 NumberStruck 25429 non-null object 6 NumberStruckActual 25429 non-null int64 7 Effect 2078 non-null object 8 FlightDate 25429 non-null object 9 Damage 25429 non-null object 10 Engines 25195 non-null object 11 Operator 25429 non-null object 12 OriginState 24980 non-null object 13 FlightPhase 25429 non-null object 14 ConditionsPrecipitation 2015 non-null object 15 RemainsCollected? 25429 non-null bool 16 RemainsSentToSmithsonian 25429 non-null bool 17 Remarks 20668 non-null object 18 WildlifeSize 25429 non-null object 19 ConditionsSky 25429 non-null object 20 WildlifeSpecies 25429 non-null object 21 PilotWarned 25429 non-null object 22 Cost 25429 non-null object 23 Altitude 25429 non-null int64 24 PeopleInjured 25429 non-null int64 25 IsAircraftLarge? 25429 non-null object dtypes: bool(2), int64(4), object(20) memory usage: 4.7+ MB
Information about the dataset¶
- The output elements are as follows:
- Column: the name of the column
- Non-null Count: how many non-null values are found in the column
- Dtype: the type of value data type of each column ( int64 = int value, object = string value, bool = boolean Y/N)
- In the "Effect" column, there are 2078 non-null values, meaning there are 23351 null values. Since this accounts for only 8.17% of the occurrences. This feature can be dropped from the model.
- The "ConditionsPrecipitation" column has only 2015 non-null values. But, although the 92.08% of values are missing, we might still analyze this feature as it might serve to deduct how weather interplays in bird-aircraft collisions.
- The column "Engines" has 25195 non-null values, meaning only 0.92%
of the values are missing. Options to handle missing data include:
- Use the "AircraftType" column to search and gather the missing information from another source
- Drop the rows with missing values
- Replace the missing values with the feature's median.
- The "OriginState" column contains only 449 null values. However, upon reviewing the data, it was found that values in uppercase represent locations outside the United States. Since the scope of this analysis only includes the U.S. both types of occurances will be dropped.
- The "Remarks" column also has null values, but this column has no pertinent information. It will be dropped for this analysis.
- The remaining columns do not have any null values.
- The column "WildlifeSpecies" has no null values, but over 50% of the data is classified as "Unknown" (*see count below. Column will be dropped.
- Not all features contain pertinent data, the following are the features that will be considered in the model:
- int64(4): NumberStruckActual, Altitude, and PeopleInjured
- object(15): AirportName, AltitudeBin, MakeModel, FlightDate, Damage, Engines, OriginState, FlightPhase, WildlifeSize, ConditionsSky, PilotWarned, Cost
- Object type will be converted from strings into numbers, as machine learning algorithms require numerical data to function.
*Column "WildlifeSpecies": count "unknown" values¶
The string "unknown" appears 15586 times, meaning that 61.29% of the values lack useful information.
| RecordID | AircraftType | AirportName | AltitudeBin | MakeModel | NumberStruck | NumberStruckActual | Effect | FlightDate | Damage | ... | RemainsSentToSmithsonian | Remarks | WildlifeSize | ConditionsSky | WildlifeSpecies | PilotWarned | Cost | Altitude | PeopleInjured | IsAircraftLarge? | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 202152 | Airplane | LAGUARDIA NY | (1000, 2000] | B-737-400 | Over 100 | 859 | Engine Shut Down | 11/23/00 0:00 | Caused damage | ... | False | FLT 753. PILOT REPTD A HUNDRED BIRDS ON UNKN T... | Medium | No Cloud | Unknown bird - medium | N | 30,736 | 1500 | 0 | Yes |
| 1 | 208159 | Airplane | DALLAS/FORT WORTH INTL ARPT | (-1, 0] | MD-80 | Over 100 | 424 | NaN | 7/25/01 0:00 | Caused damage | ... | False | 102 CARCASSES FOUND. 1 LDG LIGHT ON NOSE GEAR ... | Small | Some Cloud | Rock pigeon | Y | 0 | 0 | 0 | No |
| 2 | 207601 | Airplane | LAKEFRONT AIRPORT | (30, 50] | C-500 | Over 100 | 261 | NaN | 9/14/01 0:00 | No damage | ... | False | FLEW UNDER A VERY LARGE FLOCK OF BIRDS OVER AP... | Small | No Cloud | European starling | N | 0 | 50 | 0 | No |
| 3 | 215953 | Airplane | SEATTLE-TACOMA INTL | (30, 50] | B-737-400 | Over 100 | 806 | Precautionary Landing | 9/5/02 0:00 | No damage | ... | False | NOTAM WARNING. 26 BIRDS HIT THE A/C, FORCING A... | Small | Some Cloud | European starling | Y | 0 | 50 | 0 | Yes |
| 4 | 219878 | Airplane | NORFOLK INTL | (30, 50] | CL-RJ100/200 | Over 100 | 942 | NaN | 6/23/03 0:00 | No damage | ... | False | NO DMG REPTD. | Small | No Cloud | European starling | N | 0 | 50 | 0 | No |
5 rows × 26 columns
Name of Columns¶
Index(['RecordID', 'AircraftType', 'AirportName', 'AltitudeBin', 'MakeModel',
'NumberStruck', 'NumberStruckActual', 'Effect', 'FlightDate', 'Damage',
'Engines', 'Operator', 'OriginState', 'FlightPhase',
'ConditionsPrecipitation', 'RemainsCollected?',
'RemainsSentToSmithsonian', 'Remarks', 'WildlifeSize', 'ConditionsSky',
'WildlifeSpecies', 'PilotWarned', 'Cost', 'Altitude', 'PeopleInjured',
'IsAircraftLarge?'],
dtype='object')Variable Description and Category of Features:¶
Categorical/Nominal: These features describe qualities or labels with no inherent order:
- AircraftType: Identifies the type of aircraft.
- AirportName: Specifies the airport's name.
- MakeModel: Manufacturer and model of the aircraft.
- Operator: The airline managing the aircraft.
- OriginState: U.S. state where the flight started.
- FlightPhase: Describes the stage of flight (e.g., takeoff, landing).
- WildlifeSize: Classifies the size of the bird(s).
- WildlifeSpecies: Identifies the bird species.
- ConditionsSky: Describes sky conditions during the incident.
- ConditionsPrecipitation: Indicates precipitation conditions.
- Damage: Description of the type of damage.
- Remarks: Additional notes or comments.
Binary: These features have two possible values (e.g., "Yes"/"No"):
- Effect: Whether the incident caused damage.
- RemainsCollected?: Indicates if remains were collected.
- RemainsSentToSmithsonian: If remains were sent for analysis.
- PilotWarned: If the pilot was alerted about potential bird strikes.
- IsAircraftLarge?: Whether the aircraft is classified as large.
Ordinal: These features have categories with a meaningful order or rank:
- AltitudeBin: Grouped altitude ranges, indicating height in increasing levels.
Continuous: These are numeric features with an infinite number of possible values:
- Cost: The financial cost of the incident.
- Altitude: The precise altitude (in feet) during the incident.
Count: These features represent whole numbers or counts:
- NumberStruck: Estimated count of birds struck.
- NumberStruckActual: Actual count of birds struck.
- Engines: Total number of engines involved in the incident.
- PeopleInjured: Number of individuals injured during the incident.
Numerical Variables¶
Retrieve information about the numerical features.¶
| RecordID | NumberStruckActual | Altitude | PeopleInjured | |
|---|---|---|---|---|
| count | 25429.000000 | 25429 | 25429.000000 | 25429 |
| mean | 253800.148767 | 3 | 799.028432 | 0 |
| std | 38472.800499 | 13 | 1740.079843 | 0 |
| min | 1195.000000 | 1 | 0.000000 | 0 |
| 25% | 225742.000000 | 1 | 0.000000 | 0 |
| 50% | 248609.000000 | 1 | 50.000000 | 0 |
| 75% | 269044.000000 | 1 | 700.000000 | 0 |
| max | 321909.000000 | 942 | 18000.000000 | 6 |
Observations:¶
- "RecordID" does not provide any useful information, so this feature will be dropped. The remaining features: "NumberStruckActual", "Altitude", and "PeopleInjured" do provide useful information and the values retrieved will be necessary for the final analysis.
- The "NumberStruckActual" feature represents the actual number of birds struck during each incident. All 25,429 incidents are accounted for, with a mean of 3 birds struck per incident. While one incident involved as many as 942 birds struck, the third quartile value is just 1, indicating that the data is heavily skewed. To better understand the data distribution, we will calculate the confidence interval.
- Likewise we'll calculate the confidence interval for the "Altitude" feature to find any inconsistencies in the data.
- As for "PeopleInjured", since the standard deviation is 0, it indicates no variability in its data.
Finding the Confidence Interval for the mean of Numeric Values¶
95% Confidence Intervals for Numerical Columns: Confidence Interval for NumberStruckActual: [2.542, 2.857] Confidence Interval for Altitude: [777.641, 820.416]
Visualize incidents in relation to Altitude¶
Uni-variate Analysis¶
Categorical Variables¶
Finding the Confidence Interval for the mean of Categorical Values¶
Proportions and 95% Confidence Intervals for Categorical Values: No damage: Proportion = 0.903, CI = [0.900, 0.907] Caused damage: Proportion = 0.097, CI = [0.093, 0.100]
Visualize incidents in relation to Damage¶
Uni-variate Analysis¶
Missing Data¶
RecordID 0 AircraftType 0 AirportName 0 AltitudeBin 0 MakeModel 0 NumberStruck 0 NumberStruckActual 0 Effect 23351 FlightDate 0 Damage 0 Engines 234 Operator 0 OriginState 449 FlightPhase 0 ConditionsPrecipitation 23414 RemainsCollected? 0 RemainsSentToSmithsonian 0 Remarks 4761 WildlifeSize 0 ConditionsSky 0 WildlifeSpecies 0 PilotWarned 0 Cost 0 Altitude 0 PeopleInjured 0 IsAircraftLarge? 0 dtype: int64
<Axes: >
Observations:¶
The findings reinforce what was discussed before:
- Columns "Effect" and "ConditionsPrecipitation" have over 90% missing values. "Effect" feature will be dropped.
- In column "Remarks" approximately only 19% of the values are null, but as seen before, the information in this column is not pertinent to this analysis.
- As for "Engines" and "OriginState", rows containing null on unusable values will be dealt accordingly.
Balanced Data¶
Get information about balanced or imbalanced data sets which may affect the preprocessing or algorithm.
Damage - Feature¶
['Caused damage' 'No damage']
Damage No damage 22975 Caused damage 2454 Name: count, dtype: int64
Damage No damage 90.349601 Caused damage 9.650399 Name: count, dtype: float64
Observations:¶
Of 25429 incidents, 2454 caused damage, accounting for only 9.65% of the data. This indicates a strong imbalance.
Duplicates¶
WildlifeSpecies and WildlifeSize - Features:¶
WildlifeSpecies WildlifeSize 0 Unknown bird - medium Medium 1 Rock pigeon Small 2 European starling Small 3 European starling Small 4 European starling Small ... ... ... 25424 Mallard Medium 25425 Unknown bird - large Large 25426 Tree swallow Small 25427 Unknown bird - medium Medium 25428 Red-tailed hawk Medium [25429 rows x 2 columns]
The WildlifeSpecies feature indicates the animal species involved in the incident and sometimes includes information about its size. WildlifeSize provides only the size of the animal involved. Due to this, it would be safe to either ignore 'WildlifeSpecies' (since we see later that many fields have "Unknown" as a description) or extract the pertinent data.
Outliers¶
Cost - Features:¶
Prepare "Cost" data for analysis.
- Clean data by removing commas
- Create new column "CostAsFloat" with converted data.
float64
Note that in certain situations, as we see here, outliers should not be removed. When the data is highly skewed, extreme values may be representative of the distribution's tail. Removing them could lead to a loss of valuable information.
- Over 20000 of the incidents caused no damage, while less than 5000 did caused damage. Reiterates the infoermations seen above.
Analyzing dataset features:
A. FlightPhase, Altitude - Features¶
Visualize the relationship between Altitude and FlightPhase
<Axes: xlabel='FlightPhase', ylabel='Altitude'>
Countplot
Observations:¶
- Over 10000 incidents happened during Approach, which is when the aircraft aligns with the runway and prepares to land.
- Above 4000 incidents took place at each of the following phases: Climb, Landing Roll, and Take-off run.
- Although not many, incidents happened during the Parking and Taxi phases.
Plot the Heatmap¶
Visualize count in Flight Phase and Altitude levels (both are linked)¶
FlightPhase Damage
Approach Caused damage 985
No damage 9397
Climb Caused damage 694
No damage 3735
Descent Caused damage 157
No damage 619
Landing Roll Caused damage 236
No damage 4811
Parked Caused damage 2
No damage 8
Take-off run Caused damage 376
No damage 4335
Taxi Caused damage 4
No damage 70
dtype: int64
Show how Damage is distributed across Flight Phase and Altitude levels¶
Pie chart displaying the proportion of incidents at each Flight Phase¶
array(['Climb', 'Landing Roll', 'Approach', 'Take-off run', 'Descent',
'Taxi', 'Parked'], dtype=object)Observations:¶
- The data shows that most of the incidents take place during the Take-off run phase, compared to the remaining phases. Although most of these incidents caused no damage to the aircraft.
- Almost equal number of incidents take place during Climb, Taxi, and Approach.
- During the Descent phase, fewer incidents occurred, maybe because the airplane didn't find many birds at that altitude. But, during this phase, more incidents resulted in damage.
- The number of incidents in each phase corresponds to the altitude at which birds are most likely to be present.
Correlation¶
Data shows that most incidents took place during Approach, although the vast majoprity did not caused damages.
Trend¶
The trend suggests that more incidents occurred during the Approach phase, but these incidents were generally less severe and caused no damage. On the other hand, fewer incidents took place during the Descent phase, but there was a higher likelihood of these incidents resulting in damage.
Note: This could be due to the nature of the incidents or the fact that there are fewer obstacles like birds at lower altitudes during descent.
B. Pilots Warned - Feature¶
Explore the counts of PilotWarned and Damage
FlightYear PilotWarned Damage Counts 0 2000 N Caused damage 146 1 2000 N No damage 789 2 2000 Y Caused damage 50 3 2000 Y No damage 382 4 2001 N Caused damage 88 5 2001 N No damage 707 6 2001 Y Caused damage 35 7 2001 Y No damage 400 8 2002 N Caused damage 123 9 2002 N No damage 976 10 2002 Y Caused damage 63 11 2002 Y No damage 519 12 2003 N Caused damage 117 13 2003 N No damage 889 14 2003 Y Caused damage 84 15 2003 Y No damage 478 16 2004 N Caused damage 117 17 2004 N No damage 951 18 2004 Y Caused damage 67 19 2004 Y No damage 557 20 2005 N Caused damage 122 21 2005 N No damage 1079 22 2005 Y Caused damage 58 23 2005 Y No damage 594 24 2006 N Caused damage 141 25 2006 N No damage 1124 26 2006 Y Caused damage 91 27 2006 Y No damage 803 28 2007 N Caused damage 139 29 2007 N No damage 1248 30 2007 Y Caused damage 78 31 2007 Y No damage 836 32 2008 N Caused damage 126 33 2008 N No damage 1121 34 2008 Y Caused damage 92 35 2008 Y No damage 919 36 2009 N Caused damage 149 37 2009 N No damage 1463 38 2009 Y Caused damage 101 39 2009 Y No damage 1534 40 2010 N Caused damage 140 41 2010 N No damage 1349 42 2010 Y Caused damage 114 43 2010 Y No damage 1518 44 2011 N Caused damage 124 45 2011 N No damage 1339 46 2011 Y Caused damage 89 47 2011 Y No damage 1400
Observations:¶
- The charts above confirm the expected trend: incidents where pilots were warned caused less damage compared to those where pilots were not warned.
- However, the instances in which pilots were warned account for only 15% more of the total incidents.
Correlation¶
For incidents that did not cause damage, the number of incidents when the pilot was warned was significantly lower before 2009. In contrast, for incidents causing damage, there was no substantial difference between the incidents when the pilot was warned and when they were not.
C. Engine - Feature¶
- Engine feature contains null values. These will be filled with the median, or the most frequent value.
2
Visualize the distribution, we'll remove null values.
Faceted plot, each plot corresponds to a different subset of the data based on the "Engine" column.¶
Plotting Probability Density Functions (PDFs) for 'Engines' based on the 'Damage' column¶
Violin plot¶
NumberStruckActual 1 2 3 4 5 6 7 8 \ Damage Caused damage 1789.0 71.0 61.0 63.0 69.0 57.0 63.0 66.0 No damage 19001.0 396.0 408.0 420.0 467.0 406.0 432.0 392.0 NumberStruckActual 9 10 ... 99 100 227 261 320 424 537 806 \ Damage ... Caused damage 66.0 68.0 ... NaN NaN 1.0 NaN 1.0 1.0 NaN NaN No damage 421.0 393.0 ... 4.0 1.0 NaN 1.0 NaN NaN 1.0 1.0 NumberStruckActual 859 942 Damage Caused damage 1.0 NaN No damage NaN 1.0 [2 rows x 106 columns] PeopleInjured 0 1 2 6 Damage Caused damage 2441.0 9.0 3.0 1.0 No damage 22975.0 NaN NaN NaN IsAircraftLarge? No Yes Damage Caused damage 1850 604 No damage 15177 7798
Observations:¶
- NumberStruckActual: Most incidents had only 1 actual struck.
- PeopleInjured - Here we see that in 13 events people were injured.
- IsAircraftLarge? - Most incidents with no damage involved smaller aircraft. Large aircraft, have a higher count in the non-damaging incidents but are involved less in the damaging incidents.
Bi-variate analysis¶
Box Plot
The number of engines is not a factor to be considered for people getting ingured.
The number of engines did not appear to affect the number of people injured. While it seems that larger wildlife sizes had a greater impact on 1-engine aircraft, we know that in the incident where 6 people were injured, small wildlife was involved. Additionally, the number of incidents with injured people is not large enough to draw definitive conclusions (Fortunately, there have not been many dangerous incidents).
There does not seem to be a relation within how different amount of birds struck damage engines, as the incidents with high amounts of birds were not many.
Observations:¶
- Most of the incidents involved 2 engine aircraft.
- Of those incidents, most did not result in damages.
- As for aircraft with 1, 3, and 4 engines, the difference between the resulting damage/no damage was not as great.
- Interestingly, the violin plot shows that when the incident involved a 2-engine aircraft and resulted in damage, medium-sized wildlife caused similar damage than the large size.
Correlation¶
The data shows weak correlations overall, with some indications that 2-engine aircraft are involved in more incidents, and larger aircraft are less likely to experience damaging incidents.
Trend¶
The trend suggests that two-engine aircraft have been involved in a higher number of incidents.
Note: This could imply that two-engine aircraft are more prone to incidents, or it could be that there are simply more of these aircraft in existence, leading to a higher number of incidents.
D. ConditionsPrecipitation - Feature¶
Note that this feature contains only 2,015 non-null values, so the analysis is preliminary and should be interpreted with caution
([0, 1, 2, 3, 4, 5, 6], [Text(0, 0, 'Snow'), Text(1, 0, 'Fog'), Text(2, 0, 'Rain'), Text(3, 0, 'Fog, Rain'), Text(4, 0, 'Rain, Snow'), Text(5, 0, 'Fog, Rain, Snow'), Text(6, 0, 'Fog, Snow')])
Observations:¶
- Although the sample data is limited, it provides a reasonable insight into how weather impacts bird-aircraft collision occurrences.2.
- Most incidents occurred during "Rain", although most did not result in damages.
- During "Rain, Snow" conditions, incidents were more likely to cause damage; however, these conditions accounted for only a small number of occurrences.
Trend¶
Rain increases the frequency of bird-aircraft collisions, but these incidents are generally less likely to cause damage. However, when rain and snow conditions are present, incidents are more likely to result in damage, although these conditions occur less frequently.
E. MakeModel - Feature¶
Observations:¶
- Among all the models involved in bird-aircraft collisions, only a small number experienced repeated incidents.
- Aircraft model B-737-700 (2-engine) had the most incidents.
- As seen before, most of the models involved in incidents had 2 engines.
Trend¶
The trend indicates a strong association between the number of engines and the likelihood of involvement in collision.
Note: As stated before, this could imply that two-engine aircraft are more prone to incidents, or it could be that there are simply more of these aircraft in existence, leading to a higher number of incidents.
F. NumberStruckActual - Feature¶
Pie chart to visualize the distribution of the number of birds struck on a single incident¶
Multivariate analysis¶
Correlation Between NumberStruckActual, Damage, Engines, IsAircraftLarge? and PeopleInjured
Damage No damage 22975 Caused damage 2454 Name: count, dtype: int64
The inference we can draw from this table is:
The heatmap shows there is little correlation between the considered factors.
There is some negative correlation between the number of Engines and Damage. As we've seen the 2-engine aircraft had the most incidents.
There is a very weak positive correlation between Engines and the number of PeopleInjured.
| Engines | IsAircraftLarge? | NumberStruckActual | PeopleInjured | |
|---|---|---|---|---|
| Damage | ||||
| 0 | 2.011665 | 0.339412 | 2.485745 | 0.000000 |
| 1 | 1.931540 | 0.246129 | 4.702119 | 0.008557 |
Observations:¶
- Over 80% of incidents implied only one bird.
- The vast majority of incidents where there was only one bird involved,did not cused damage.
- Overall ost incidents did not caused damage.
Trend¶
The data show that most incidents implied only one bird.
G. FlightDate - Feature¶
Explore Trends Over Time: Yearly distribution of bird strike incidents. We need to group FlightDate column by year
| RecordID | AircraftType | AirportName | AltitudeBin | MakeModel | NumberStruck | NumberStruckActual | Effect | FlightDate | Damage | ... | ConditionsSky | WildlifeSpecies | PilotWarned | Cost | Altitude | PeopleInjured | IsAircraftLarge? | CostAsFloat | FlightYear | NumberStruckBinned | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 25424 | 319672 | Airplane | SACRAMENTO INTL | (0, 10] | B-737-700 | 1 | 1 | NaN | 2011-12-29 | 0 | ... | No Cloud | Mallard | Y | 0 | 0-1000 | 0 | 1 | 0.0 | 2011 | 1 |
| 25425 | 321151 | Airplane | REDDING MUNICIPAL | (1000, 2000] | EMB-120 | 1 | 1 | NaN | 2011-12-30 | 0 | ... | Overcast | Unknown bird - large | N | 0 | 1000-3000 | 0 | 0 | 0.0 | 2011 | 1 |
| 25426 | 319677 | Airplane | ORLANDO INTL | (-1, 0] | A-321 | 1 | 1 | NaN | 2011-12-30 | 0 | ... | Some Cloud | Tree swallow | Y | 0 | NaN | 0 | 0 | 0.0 | 2011 | 1 |
| 25427 | 319679 | Airplane | DETROIT METRO WAYNE COUNTY ARPT | (-1, 0] | B-757-200 | 1 | 1 | NaN | 2011-12-31 | 0 | ... | Some Cloud | Unknown bird - medium | Y | 0 | NaN | 0 | 1 | 0.0 | 2011 | 1 |
| 25428 | 319593 | Airplane | ABRAHAM LINCOLN CAPITAL ARPT | (-1, 0] | B-737-400 | 1 | 1 | NaN | 2011-12-31 | 1 | ... | No Cloud | Red-tailed hawk | N | 0 | NaN | 0 | 1 | 0.0 | 2011 | 1 |
5 rows × 29 columns
Correlation Between Year and Damage¶
Observations¶
- Although most of the incidents took place in 2009, the amount of incidents that caused damage has been very constant through the years.
- From 2000 to 2008 the incidents gradually increased, before jumping steeply in 2011. After that incidents started decreasing.
Trend¶
The overall trend is upward. Data shows initial growth over 9 years, followed by a decrease over the next 3 years.
H. OriginState - Feature¶
Correlation Between OriginState and Damage for the top 30 incidents
Top 3 states with the highest incident counts: OriginState California 2499 Texas 2445 Florida 2045 Name: count, dtype: int64
Comments¶
- California, Texas, and Florida are the states with more incidents compared to their countries.
- It is fair to say that incidents have happened in all of U.S. territory.
Trend¶
The incidents show a localized upward trend in 3 key airports: California, Texas, and Florida.
I. Cost - Feature¶
Note: "CostAsFloat" column was added above, on: 5. Data Quality Assessment: Outliers for analysis.
| RecordID | AircraftType | AirportName | AltitudeBin | MakeModel | NumberStruck | NumberStruckActual | Effect | FlightDate | Damage | ... | ConditionsSky | WildlifeSpecies | PilotWarned | Cost | Altitude | PeopleInjured | IsAircraftLarge? | CostAsFloat | FlightYear | NumberStruckBinned | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 202152 | Airplane | LAGUARDIA NY | (1000, 2000] | B-737-400 | Over 100 | 859 | Engine Shut Down | 2000-11-23 | 1 | ... | No Cloud | Unknown bird - medium | N | 30,736 | 1000-3000 | 0 | 1 | 30736.0 | 2000 | NaN |
| 1 | 208159 | Airplane | DALLAS/FORT WORTH INTL ARPT | (-1, 0] | MD-80 | Over 100 | 424 | NaN | 2001-07-25 | 1 | ... | Some Cloud | Rock pigeon | Y | 0 | NaN | 0 | 0 | 0.0 | 2001 | NaN |
| 2 | 207601 | Airplane | LAKEFRONT AIRPORT | (30, 50] | C-500 | Over 100 | 261 | NaN | 2001-09-14 | 0 | ... | No Cloud | European starling | N | 0 | 0-1000 | 0 | 0 | 0.0 | 2001 | NaN |
| 3 | 215953 | Airplane | SEATTLE-TACOMA INTL | (30, 50] | B-737-400 | Over 100 | 806 | Precautionary Landing | 2002-09-05 | 0 | ... | Some Cloud | European starling | Y | 0 | 0-1000 | 0 | 1 | 0.0 | 2002 | NaN |
| 4 | 219878 | Airplane | NORFOLK INTL | (30, 50] | CL-RJ100/200 | Over 100 | 942 | NaN | 2003-06-23 | 0 | ... | No Cloud | European starling | N | 0 | 0-1000 | 0 | 0 | 0.0 | 2003 | NaN |
5 rows × 29 columns
Create a histogram to visualize the distribution of Cost
Top 5 highest values in CostAsFloat and corresponding 'PeopleInjured':
CostAsFloat PeopleInjured WildlifeSpecies
232 12397751.0 2 White-tailed deer
4501 5704387.0 0 Canada goose
13796 5704387.0 0 White-tailed kite
23407 4570000.0 0 Bald eagle
291 3644483.0 0 Canada goose
Observations¶
- Although there was one very costly incident, most are not as expensive
- The most costly incident resulted in 2 people being injured. And it was not a collision against a bird.
Trend¶
The most costly incident involved 2 people being injured, but it was not related to a bird collision, indicating that the severity and cost of incidents may not always correlate with bird strikes.
Although the dataset is highly usable and provides valuable insights, there are some gaps that limit a deeper understanding of the trends. For example, we observed an increase in incident occurrences over time, but it's unclear whether this is due to the growing number of planes in operation, urban expansion pushing into bird habitats, or potential changes in aircraft engine design.
While the data offers a strong foundation for analysis, it raises several questions that can only be answered with additional external data. Factors such as the number of aircraft, environmental changes, and even advancements in aviation technology could all influence the trends seen in this dataset. To fully comprehend the underlying causes and patterns, more comprehensive data will be needed to explore these variables in greater detail.
J. PeopleInjured - Feature¶
Number of people injured: 13
Exploring the incidents where people sustained injuries.
Total Incidents: 25429 There were 13 incidents in which people sustained injuries, resulting in a total of 21 individuals injured.
| PeopleInjured | MakeModel | Engines | FlightPhase | WildlifeSize | WildlifeSpecies | ConditionsSky | Cost | |
|---|---|---|---|---|---|---|---|---|
| 4422 | 6 | LEARJET-24 | 2 | Climb | Small | Unknown bird - small | No Cloud | 926,070 |
| 232 | 2 | LEARJET-60 | 2 | Landing Roll | Large | White-tailed deer | No Cloud | 12,397,751 |
| 7180 | 2 | C-172 | 1 | Descent | Large | Turkey vulture | Some Cloud | 1,382 |
| 24879 | 2 | MAULE M-7 | 1 | Climb | Medium | Ducks | No Cloud | 14,000 |
| 780 | 1 | DHC8 DASH 8 | 2 | Approach | Medium | Lesser scaup | No Cloud | 123,476 |
| 2636 | 1 | CIRRUS SR 20/22 | 1 | Climb | Medium | Anhinga | Overcast | 0 |
| 2779 | 1 | C-402 | 2 | Climb | Large | Black vulture | Some Cloud | 0 |
| 5822 | 1 | BE-95 | 2 | Approach | Large | Black vulture | No Cloud | 26,553 |
| 7047 | 1 | C-172 | 1 | Approach | Large | Turkey vulture | No Cloud | 570 |
| 8453 | 1 | PA-60 601 | 2 | Climb | Large | Black vulture | No Cloud | 9,878 |
| 12250 | 1 | C-210 CENTUR | 1 | Approach | Large | Black vulture | No Cloud | 0 |
| 16114 | 1 | C-172 | 1 | Approach | Large | Black vulture | No Cloud | 0 |
| 17066 | 1 | ERCO 415 | 1 | Climb | Medium | Unknown bird - medium | No Cloud | 27,057 |
Visualizing correlation matrix heatmap.
Observations¶
- Approximately 0.05% of incidents resulted in injuries to people.
- There is a a weak correlation between "PeopleInjured" and these other factors: "NumberStruckActual", "Engines", "CostAsFloat".
- Interestingly, the incident that resulted in the highest number of injuries involved small birds.
Correlation¶
The data shows little correlation between the fact that people got injured and other factors.
Although the dataset is highly usable and provides valuable insights, there are some gaps that limit a deeper understanding of the trends. For example, we observed an increase in incident occurrences over time, but it's unclear whether this is due to the growing number of planes in operation, urban expansion pushing into bird habitats, or potential changes in aircraft engine design.
While the data offers a strong foundation for analysis, it raises several questions that can only be answered with additional external data. Factors such as the number of aircraft, environmental changes, and even advancements in aviation technology could all influence the trends seen in this dataset. To fully comprehend the underlying causes and patterns, more comprehensive data will be needed to explore these variables in greater detail.
Research Questions¶
The research questions for this project focus on the data, aiming to identify correlations and trends through feature analysis.
1. Damage Analysis: How do bird strikes affect flight operations and result in damage to aircraft?¶
Despite the frequency of bird strike incidents, the damage ratio does not appear to be a major concern overall. While the number of incidents is high, only a small fraction of these result in significant damage. Most incidents are minor and do not cause any lasting harm to the aircraft. There have been a few isolated costly incidents, but interestingly, even these were not caused by bird strikes. This suggests that while bird strikes are common, the severity and consequences in terms of damage are relatively low for the majority of cases. The data indicates that bird strikes do not often lead to serious disruptions in flight operations.
2. Flight Phase: During which phases of flight are bird strikes most likely to happen?¶
Bird strikes are most likely to occur during the Approach phase of flight, where aircraft are generally flying at lower altitudes and are closer to airports, increasing the chances of encountering birds. Interestingly, the data shows that incidents are equally distributed across other flight phases, such as Climb, Landing Roll, and Take-off Run. This suggests that bird strikes are not restricted to a single phase but can occur at various points during the flight. Likely, height during Approach is the cause, due to the proximity to bird habitats around airports.
3. Pilot Awareness: Is there a correlation between pilot awareness of bird strike risks and the severity of incidents?¶
It does not appear that pilot awareness has a significant proactive effect on the severity of bird strike incidents. The data shows little difference in the severity of incidents between cases where pilots were aware of the bird strike risk and those where they were not. This suggests that, while pilot awareness might be important for preventing some types of incidents, it may not be a decisive factor in minimizing the severity of bird strikes. In fact, damage-causing incidents do not show a marked difference in the Pilot Warned status, implying that other factors, such as the timing or circumstances of the incident, play a more significant role in determining its severity.
4. Model Propensity: Which model has a higher propensity to be implicated in bird strikes?¶
Aircraft models with two engines have shown a higher tendency to be involved in bird strikes. Specifically, the B-737-700, a popular two-engine model, had the highest number of incidents. This could be due to the large number of these aircraft in operation, as more frequent flights naturally increase the likelihood of encountering bird strikes. However, the higher incidence of bird strikes among two-engine models may also reflect the operational characteristics of these aircraft, such as their flight patterns or altitudes. Despite being a commonly used model, the frequency of bird strikes for these aircraft does not necessarily indicate a design flaw, but rather a statistical outcome of their widespread use.
5. Airport Incidents: Which U.S. airports experience the highest frequency of bird strikes?¶
The states with the highest frequency of bird strikes are:
California: 2,499 incidents
Texas: 2,445 incidents
Florida: 2,045 incidents
These states have a significantly higher number of bird strike incidents compared to others. The high frequency in these states can likely be attributed to a combination of factors, including their large air traffic, and larger populations of birds in these regions. Additionally, urban and industrial expansion in these areas may contribute to more frequent encounters between aircraft and birds, especially around airports that are often located in areas close to wildlife habitats.
6. Injury Analysis: What key factors contribute to incidents where people sustain injuries?¶
While the fact that people sustaining injuries during bird strike incidents warrants attention, the data does not reveal any clear correlation or pattern that suggests a foreseeable solution to mitigate such occurrences. Notably, the incident resulting in the highest number of injuries involved small birds, which challenges the assumption that larger birds pose a greater risk. This suggests that preparing specifically for incidents involving particular bird sizes may not yield significantly better outcomes. Instead, a broader approach to prevention and safety measures might be more effective in addressing the varied nature of these incidents.
Potential Use Cases¶
Three key potential use cases for the findings:¶
1. Airport Wildlife Management¶
Airports can use the data to enhance wildlife hazard management programs. Understanding that bird strikes are most frequent during the Approach phase allows airports to focus on improving bird deterrent systems and monitoring in these critical areas. Airports could implement more advanced technologies such as radar systems, sonic deterrents, or vegetation management around vulnerable flight paths to reduce the likelihood of bird encounters.
2. Pilot Training Programs¶
The data showing that pilot awareness does not significantly reduce the severity of incidents suggests a need for better reactive training. Flight schools and airlines could develop bird strike management courses that focus on how to handle incidents when they occur, especially during high-risk phases like approach. The training could also address emergency response protocols and strategies to reduce damage after a bird strike.
3. Aircraft Design and Engineering¶
Given that two-engine aircraft (e.g., B-737-700) are involved in the most bird strikes, aircraft manufacturers could focus on designing more bird strike-resistant features. For instance, stronger windscreens, more durable engines, and other structural improvements could help reduce the damage from bird strikes. Additionally, examining the engineering needs for popular two-engine models could lead to the development of new systems that better withstand bird collisions.
These use cases focus on improving safety, enhancing operational efficiency, and reducing risks for the aviation industry based on the insights derived from the data.